{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 你的第一个神经网络\n",
    "\n",
    "在此项目中，你将构建你的第一个神经网络，并用该网络预测每日自行车租客人数。我们提供了一些代码，但是需要你来实现神经网络（大部分内容）。提交此项目后，欢迎进一步探索该数据和模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载和准备数据\n",
    "\n",
    "构建神经网络的关键一步是正确地准备数据。不同尺度级别的变量使网络难以高效地掌握正确的权重。我们在下方已经提供了加载和准备数据的代码。你很快将进一步学习这些代码！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data_path = 'Bike-Sharing-Dataset/hour.csv'\n",
    "\n",
    "rides = pd.read_csv(data_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "rides.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据简介\n",
    "\n",
    "此数据集包含的是从 2011 年 1 月 1 日到 2012 年 12 月 31 日期间每天每小时的骑车人数。骑车用户分成临时用户和注册用户，cnt 列是骑车用户数汇总列。你可以在上方看到前几行数据。\n",
    "\n",
    "下图展示的是数据集中前 10 天左右的骑车人数（某些天不一定是 24 个条目，所以不是精确的 10 天）。你可以在这里看到每小时租金。这些数据很复杂！周末的骑行人数少些，工作日上下班期间是骑行高峰期。我们还可以从上方的数据中看到温度、湿度和风速信息，所有这些信息都会影响骑行人数。你需要用你的模型展示所有这些数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "rides[:24*10].plot(x='dteday', y='cnt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 虚拟变量（哑变量）\n",
    "\n",
    "下面是一些分类变量，例如季节、天气、月份。要在我们的模型中包含这些数据，我们需要创建二进制虚拟变量。用 Pandas 库中的 `get_dummies()` 就可以轻松实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']\n",
    "for each in dummy_fields:\n",
    "    dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)\n",
    "    rides = pd.concat([rides, dummies], axis=1)\n",
    "\n",
    "fields_to_drop = ['instant', 'dteday', 'season', 'weathersit', \n",
    "                  'weekday', 'atemp', 'mnth', 'workingday', 'hr']\n",
    "data = rides.drop(fields_to_drop, axis=1)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 调整目标变量\n",
    "\n",
    "为了更轻松地训练网络，我们将对每个连续变量标准化，即转换和调整变量，使它们的均值为 0，标准差为 1。\n",
    "\n",
    "我们会保存换算因子，以便当我们使用网络进行预测时可以还原数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']\n",
    "# Store scalings in a dictionary so we can convert back later\n",
    "scaled_features = {}\n",
    "for each in quant_features:\n",
    "    mean, std = data[each].mean(), data[each].std()\n",
    "    scaled_features[each] = [mean, std]\n",
    "    data.loc[:, each] = (data[each] - mean)/std"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 将数据拆分为训练、测试和验证数据集\n",
    "\n",
    "我们将大约最后 21 天的数据保存为测试数据集，这些数据集会在训练完网络后使用。我们将使用该数据集进行预测，并与实际的骑行人数进行对比。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Save data for approximately the last 21 days \n",
    "test_data = data[-21*24:]\n",
    "\n",
    "# Now remove the test data from the data set \n",
    "data = data[:-21*24]\n",
    "\n",
    "# Separate the data into features and targets\n",
    "target_fields = ['cnt', 'casual', 'registered']\n",
    "features, targets = data.drop(target_fields, axis=1), data[target_fields]\n",
    "test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fields]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们将数据拆分为两个数据集，一个用作训练，一个在网络训练完后用来验证网络。因为数据是有时间序列特性的，所以我们用历史数据进行训练，然后尝试预测未来数据（验证数据集）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Hold out the last 60 days or so of the remaining data as a validation set\n",
    "train_features, train_targets = features[:-60*24], targets[:-60*24]\n",
    "val_features, val_targets = features[-60*24:], targets[-60*24:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 开始构建网络\n",
    "\n",
    "下面你将构建自己的网络。我们已经构建好结构和反向传递部分。你将实现网络的前向传递部分。还需要设置超参数：学习速率、隐藏单元的数量，以及训练传递数量。\n",
    "\n",
    "<img src=\"assets/neural_network.png\" width=300px>\n",
    "\n",
    "该网络有两个层级，一个隐藏层和一个输出层。隐藏层级将使用 S 型函数作为激活函数。输出层只有一个节点，用于递归，节点的输出和节点的输入相同。即激活函数是 $f(x)=x$。这种函数获得输入信号，并生成输出信号，但是会考虑阈值，称为激活函数。我们完成网络的每个层级，并计算每个神经元的输出。一个层级的所有输出变成下一层级神经元的输入。这一流程叫做前向传播（forward propagation）。\n",
    "\n",
    "我们在神经网络中使用权重将信号从输入层传播到输出层。我们还使用权重将错误从输出层传播回网络，以便更新权重。这叫做反向传播（backpropagation）。\n",
    "\n",
    "> **提示**：你需要为反向传播实现计算输出激活函数 ($f(x) = x$) 的导数。如果你不熟悉微积分，其实该函数就等同于等式 $y = x$。该等式的斜率是多少？也就是导数 $f(x)$。\n",
    "\n",
    "\n",
    "你需要完成以下任务：\n",
    "\n",
    "1. 实现 S 型激活函数。将 `__init__` 中的 `self.activation_function`  设为你的 S 型函数。\n",
    "2. 在 `train` 方法中实现前向传递。\n",
    "3. 在 `train` 方法中实现反向传播算法，包括计算输出错误。\n",
    "4. 在 `run` 方法中实现前向传递。\n",
    "\n",
    "  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class NeuralNetwork(object):\n",
    "    def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):\n",
    "        # Set number of nodes in input, hidden and output layers.\n",
    "        self.input_nodes = input_nodes\n",
    "        self.hidden_nodes = hidden_nodes\n",
    "        self.output_nodes = output_nodes\n",
    "\n",
    "        # Initialize weights\n",
    "        self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5, \n",
    "                                       (self.input_nodes, self.hidden_nodes))\n",
    "\n",
    "        self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5, \n",
    "                                       (self.hidden_nodes, self.output_nodes))\n",
    "        self.lr = learning_rate\n",
    "        \n",
    "        #### TODO: Set self.activation_function to your implemented sigmoid function ####\n",
    "        #\n",
    "        # Note: in Python, you can define a function with a lambda expression,\n",
    "        # as shown below.\n",
    "        self.activation_function = lambda x : 0  # Replace 0 with your sigmoid calculation.\n",
    "        \n",
    "        ### If the lambda code above is not something you're familiar with,\n",
    "        # You can uncomment out the following three lines and put your \n",
    "        # implementation there instead.\n",
    "        #\n",
    "        #def sigmoid(x):\n",
    "        #    return 0  # Replace 0 with your sigmoid calculation here\n",
    "        #self.activation_function = sigmoid\n",
    "                    \n",
    "    \n",
    "    def train(self, features, targets):\n",
    "        ''' Train the network on batch of features and targets. \n",
    "        \n",
    "            Arguments\n",
    "            ---------\n",
    "            \n",
    "            features: 2D array, each row is one data record, each column is a feature\n",
    "            targets: 1D array of target values\n",
    "        \n",
    "        '''\n",
    "        n_records = features.shape[0]\n",
    "        delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape)\n",
    "        delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape)\n",
    "        for X, y in zip(features, targets):\n",
    "            #### Implement the forward pass here ####\n",
    "            ### Forward pass ###\n",
    "            # TODO: Hidden layer - Replace these values with your calculations.\n",
    "            hidden_inputs = None # signals into hidden layer\n",
    "            hidden_outputs = None # signals from hidden layer\n",
    "\n",
    "            # TODO: Output layer - Replace these values with your calculations.\n",
    "            final_inputs = None # signals into final output layer\n",
    "            final_outputs = None # signals from final output layer\n",
    "            \n",
    "            #### Implement the backward pass here ####\n",
    "            ### Backward pass ###\n",
    "\n",
    "            # TODO: Output error - Replace this value with your calculations.\n",
    "            error = None # Output layer error is the difference between desired target and actual output.\n",
    "            \n",
    "            # TODO: Calculate the hidden layer's contribution to the error\n",
    "            hidden_error = None\n",
    "            \n",
    "            # TODO: Backpropagated error terms - Replace these values with your calculations.\n",
    "            output_error_term = None\n",
    "            hidden_error_term = None\n",
    "\n",
    "            # Weight step (input to hidden)\n",
    "            delta_weights_i_h += None\n",
    "            # Weight step (hidden to output)\n",
    "            delta_weights_h_o += None\n",
    "\n",
    "        # TODO: Update the weights - Replace these values with your calculations.\n",
    "        self.weights_hidden_to_output += None # update hidden-to-output weights with gradient descent step\n",
    "        self.weights_input_to_hidden += None # update input-to-hidden weights with gradient descent step\n",
    " \n",
    "    def run(self, features):\n",
    "        ''' Run a forward pass through the network with input features \n",
    "        \n",
    "            Arguments\n",
    "            ---------\n",
    "            features: 1D array of feature values\n",
    "        '''\n",
    "        \n",
    "        #### Implement the forward pass here ####\n",
    "        # TODO: Hidden layer - replace these values with the appropriate calculations.\n",
    "        hidden_inputs = None # signals into hidden layer\n",
    "        hidden_outputs = None # signals from hidden layer\n",
    "        \n",
    "        # TODO: Output layer - Replace these values with the appropriate calculations.\n",
    "        final_inputs = None # signals into final output layer\n",
    "        final_outputs = None # signals from final output layer \n",
    "        \n",
    "        return final_outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def MSE(y, Y):\n",
    "    return np.mean((y-Y)**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 单元测试\n",
    "\n",
    "运行这些单元测试，检查你的网络实现是否正确。这样可以帮助你确保网络已正确实现，然后再开始训练网络。这些测试必须成功才能通过此项目。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import unittest\n",
    "\n",
    "inputs = np.array([[0.5, -0.2, 0.1]])\n",
    "targets = np.array([[0.4]])\n",
    "test_w_i_h = np.array([[0.1, -0.2],\n",
    "                       [0.4, 0.5],\n",
    "                       [-0.3, 0.2]])\n",
    "test_w_h_o = np.array([[0.3],\n",
    "                       [-0.1]])\n",
    "\n",
    "class TestMethods(unittest.TestCase):\n",
    "    \n",
    "    ##########\n",
    "    # Unit tests for data loading\n",
    "    ##########\n",
    "    \n",
    "    def test_data_path(self):\n",
    "        # Test that file path to dataset has been unaltered\n",
    "        self.assertTrue(data_path.lower() == 'bike-sharing-dataset/hour.csv')\n",
    "        \n",
    "    def test_data_loaded(self):\n",
    "        # Test that data frame loaded\n",
    "        self.assertTrue(isinstance(rides, pd.DataFrame))\n",
    "    \n",
    "    ##########\n",
    "    # Unit tests for network functionality\n",
    "    ##########\n",
    "\n",
    "    def test_activation(self):\n",
    "        network = NeuralNetwork(3, 2, 1, 0.5)\n",
    "        # Test that the activation function is a sigmoid\n",
    "        self.assertTrue(np.all(network.activation_function(0.5) == 1/(1+np.exp(-0.5))))\n",
    "\n",
    "    def test_train(self):\n",
    "        # Test that weights are updated correctly on training\n",
    "        network = NeuralNetwork(3, 2, 1, 0.5)\n",
    "        network.weights_input_to_hidden = test_w_i_h.copy()\n",
    "        network.weights_hidden_to_output = test_w_h_o.copy()\n",
    "        \n",
    "        network.train(inputs, targets)\n",
    "        self.assertTrue(np.allclose(network.weights_hidden_to_output, \n",
    "                                    np.array([[ 0.37275328], \n",
    "                                              [-0.03172939]])))\n",
    "        self.assertTrue(np.allclose(network.weights_input_to_hidden,\n",
    "                                    np.array([[ 0.10562014, -0.20185996], \n",
    "                                              [0.39775194, 0.50074398], \n",
    "                                              [-0.29887597, 0.19962801]])))\n",
    "\n",
    "    def test_run(self):\n",
    "        # Test correctness of run method\n",
    "        network = NeuralNetwork(3, 2, 1, 0.5)\n",
    "        network.weights_input_to_hidden = test_w_i_h.copy()\n",
    "        network.weights_hidden_to_output = test_w_h_o.copy()\n",
    "\n",
    "        self.assertTrue(np.allclose(network.run(inputs), 0.09998924))\n",
    "\n",
    "suite = unittest.TestLoader().loadTestsFromModule(TestMethods())\n",
    "unittest.TextTestRunner().run(suite)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练网络\n",
    "\n",
    "现在你将设置网络的超参数。策略是设置的超参数使训练集上的错误很小但是数据不会过拟合。如果网络训练时间太长，或者有太多的隐藏节点，可能就会过于针对特定训练集，无法泛化到验证数据集。即当训练集的损失降低时，验证集的损失将开始增大。\n",
    "\n",
    "你还将采用随机梯度下降 (SGD) 方法训练网络。对于每次训练，都获取随机样本数据，而不是整个数据集。与普通梯度下降相比，训练次数要更多，但是每次时间更短。这样的话，网络训练效率更高。稍后你将详细了解 SGD。\n",
    "\n",
    "\n",
    "### 选择迭代次数\n",
    "\n",
    "也就是训练网络时从训练数据中抽样的批次数量。迭代次数越多，模型就与数据越拟合。但是，如果迭代次数太多，模型就无法很好地泛化到其他数据，这叫做过拟合。你需要选择一个使训练损失很低并且验证损失保持中等水平的数字。当你开始过拟合时，你会发现训练损失继续下降，但是验证损失开始上升。\n",
    "\n",
    "### 选择学习速率\n",
    "\n",
    "速率可以调整权重更新幅度。如果速率太大，权重就会太大，导致网络无法与数据相拟合。建议从 0.1 开始。如果网络在与数据拟合时遇到问题，尝试降低学习速率。注意，学习速率越低，权重更新的步长就越小，神经网络收敛的时间就越长。\n",
    "\n",
    "\n",
    "### 选择隐藏节点数量\n",
    "\n",
    "隐藏节点越多，模型的预测结果就越准确。尝试不同的隐藏节点的数量，看看对性能有何影响。你可以查看损失字典，寻找网络性能指标。如果隐藏单元的数量太少，那么模型就没有足够的空间进行学习，如果太多，则学习方向就有太多的选择。选择隐藏单元数量的技巧在于找到合适的平衡点。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "### Set the hyperparameters here ###\n",
    "iterations = 100\n",
    "learning_rate = 0.1\n",
    "hidden_nodes = 2\n",
    "output_nodes = 1\n",
    "\n",
    "N_i = train_features.shape[1]\n",
    "network = NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)\n",
    "\n",
    "losses = {'train':[], 'validation':[]}\n",
    "for ii in range(iterations):\n",
    "    # Go through a random batch of 128 records from the training data set\n",
    "    batch = np.random.choice(train_features.index, size=128)\n",
    "    X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt']\n",
    "                             \n",
    "    network.train(X, y)\n",
    "    \n",
    "    # Printing out the training progress\n",
    "    train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values)\n",
    "    val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values)\n",
    "    sys.stdout.write(\"\\rProgress: {:2.1f}\".format(100 * ii/float(iterations)) \\\n",
    "                     + \"% ... Training loss: \" + str(train_loss)[:5] \\\n",
    "                     + \" ... Validation loss: \" + str(val_loss)[:5])\n",
    "    sys.stdout.flush()\n",
    "    \n",
    "    losses['train'].append(train_loss)\n",
    "    losses['validation'].append(val_loss)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.plot(losses['train'], label='Training loss')\n",
    "plt.plot(losses['validation'], label='Validation loss')\n",
    "plt.legend()\n",
    "_ = plt.ylim()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 检查预测结果\n",
    "\n",
    "使用测试数据看看网络对数据建模的效果如何。如果完全错了，请确保网络中的每步都正确实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(8,4))\n",
    "\n",
    "mean, std = scaled_features['cnt']\n",
    "predictions = network.run(test_features).T*std + mean\n",
    "ax.plot(predictions[0], label='Prediction')\n",
    "ax.plot((test_targets['cnt']*std + mean).values, label='Data')\n",
    "ax.set_xlim(right=len(predictions))\n",
    "ax.legend()\n",
    "\n",
    "dates = pd.to_datetime(rides.ix[test_data.index]['dteday'])\n",
    "dates = dates.apply(lambda d: d.strftime('%b %d'))\n",
    "ax.set_xticks(np.arange(len(dates))[12::24])\n",
    "_ = ax.set_xticklabels(dates[12::24], rotation=45)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 可选：思考下你的结果（我们不会评估这道题的答案）\n",
    "\n",
    " \n",
    "请针对你的结果回答以下问题。模型对数据的预测效果如何？哪里出现问题了？为何出现问题呢？\n",
    "\n",
    "> **注意**：你可以通过双击该单元编辑文本。如果想要预览文本，请按 Control + Enter\n",
    "\n",
    "#### 请将你的答案填写在下方\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
